This is a report of an analysis of West Texas Intermediate (WTI) crude oil prices from April 4th, 2012, to April 4th, 2022[1]. The purpose of the analysis is twofold. On the one hand, it forecasts log-returns. On the other, it calculates the Value at Risk (VaR) and Expected Shortfall(ES)[2].

Introduction

The above plot shows the price of WTI throughout a ten-year period. One can easily notice the significant drop in prices in 2020 when the price hit a low of 36 dollars per barrel, compared with 104 dollars per barrel in july 2014. Thus, the price of oil fell by more than 50%. The purpose of risk measures such as VaR and ES is to protect investors from drawdowns such as the one in 2020 by anticipating the possible losses on a particular day of trade in the future.

Given that the VaR and ES require an estimation of the prices to be calculated. Firstly, we should identify the distribution of our time series. Secondly, we should determine whether there are correlations in the data that need to be taken into account. These two steps have to be made in order to build a robust model of the oil price. Once, we have identified the distribution and whether there are correlations in the data. We built a model based on that information to forecast the price of oil. Then, we calculate the VaR and ES and run backtesting to estimate how accurate the VaR is.

The probability distribution of the oil price

Let us transform our time series into a stationary one by taking the log return of the sequence.

## [1] "DCOILWTICO"

We can see on the plot above the stationary version of our return series. Now it has a constant mean value of 0. Given that the standard normal distribution has been proposed to analyze log-returns [3]. We should first test whether the distribution of our data can be modeled using this distribution.

Jaque- Bera test

We can perform the Jaque- Bera test on the log-returns of the oil prices to determine whether the times series is normally distributed or not. The test measures whether a given times series has the kurtosis and skewness of the standard normal distribution using P-values. On the one hand, if the P-value is close to 1, we do not reject the null hypothesis (the data is normally distributed). On the other hand, if the P-value is close to cero, we reject the null hypothesis, that is, the data is not normally distributed.

The Jaque- Bera test has returned a P-value < 2.2e-16, which is very close to cero. Therefore, we reject the null hypothesis. Given that the normal distribution won´t help us to model our data. We analyze the moments of the distribution of our log-returns in search of clues on how to model it.

Moment’s log-returns oil prices: • Mean: 0.000294 • Standard deviation: 0.030252 • Skewness: 1 • Kurtosis: 41.31136

Student T distribution

The Student T distribution has a mean value of 0 and Skewness of 0, and the Standard deviation and the kurtosis depend on the value of a parameter X. Thus, the Student T distribution has two out of the four parameters required to model the data.

The standard deviation of the Student T distribution is the ratio of the square root of X/ X-2 which is defined for values of X larger than 2. The kurtosis is given by the following expression 3 + 6/X-4 defined for values of X greater than 4.

Given the dependence of the kurtosis on the parameter X. We could solve the following equation to find the value of X that matches our data: 3 + (6/X-4) = 41. In our case, X should be around 0.22. Similarly, we could estimate a value of X for the standard deviation of the Student T distribution to match the one on our time series.

Nonetheless, the fact that both the standard deviation and the kurtosis depend on the value of X leads us to a conundrum. We could find either the value X to match the standard deviation or the kurtosis of the oil price, but it would be very difficult to find a value of X that will give us both moments simultaneously.

In order to solve our problem, we need to transform the Student T distribution by standardizing the standard deviation. Let’s call this distribution the rescaled Student T distribution with the following

Moment’s rescaled Student T distribution: • Mean: 0 • Variance: 1 • Skewness: 0
• Kurtosis: 3 + 6/X-4

We can now choose a parameter X to match the kurtosis of our data, and separately select a shrink parameter to match the standard deviation of it. Therefore, we can model three out of four moments in our data with the rescaled T distribution. The only parameter with can not emulate using the rescaled T distribution is the Skewness. Nonetheless, it is more important for our purposes to model the Kurtosis in the data that Skewness because the VaR and ES depend on the former not on the later. Thus, we model the oil prices with the rescaled T distribution.

Given that our first task has been achieved, that is, to find a suitable candidate for the distribution of our data. We turn our attention to determining whether there are correlations in our time series, or in other words, we want to know if the “past” has an impact on the “present”.

Volatility Clustering

Correlations in a time series are detected with the Autocorrelation Function. Luckily for us, the ACF function in the R programing Language can be used to obtain the Autocorrelation Function for our data.

## [1] "DCOILWTICO"

We can tell whether there are correlations with previous time-lags by paying attention to the black bars between the blue bards. If a black bar is not contained between the blue lines (the 95% confidence interval), then we have an indication that the “past” has an impact on the “present”.

If we represent every day of trade as a random variable, and we select a particular day as the “present”, and the variables presiding that day as the “past”. The plot shows that the only correlation in our data is the variable that represents the present, but not with any other variables in the past. Thus, the “past” seems not to have an impact on the “present”.

## [1] "DCOILWTICO"

The plot above shows the ACF of the absolute value of our sequence of returns, that is, there are no negative values. Here the graph shows significant evidence of autocorrelation. Moreover, it shows us evidence of volatility clustering or that the data does not have a constant standard deviation [4]. Therefore, the “past” does have an impact on the present. We can use a GARCH model[5] to capture the volatility clustering in the data.

Building the model

Given that the rescaled T distribution seems to fit the distribution on our data, and then our time series shows signs of volatility clustering. We used the ugarchspec function from the rugarch package in R to build a GARCH (1,1) model with rescale T distribution errors to simulate the behavior of the oil price.

In the plot above, we can observe how well our GARCH(1,1) model fits the data. The blue lines represent the fitted standard deviation extracted from the GARCH model, while the black line is the log-returns series of the oil price. We can see that the model does follow the standard deviation path of the log-return series. Nonetheless, the model tends to over-predict the variance because it responds slowly to large, isolated returns [6].

VaR and ES

The GARCH (1,1) model can be used to calculate VaR and ES ahead in the future, in our case we will forecast one day of trade. We use the ugarchboot function of the rugarch package to run the simulation.

##      5% 
## -7.5006
## [1] -10.9222

The VaR is the amount that a portfolio might lose, given a certain probability over a particular time period, in our case one day. ES is the average of those losses that are worse than the VaR. The VaR for our data at the 95% confidence interval is around 7.5% while the ES is around 10.9%.

Let us look at an example to put the values of VaR and ES into perspective. Take a hedge fund that has invested 1 billion dollars into the oil market. The VaR implies that 5% of the time the hedge fund may lose 72 millions of US dollars of their invested capital. If the returns of oil prices are worse than the VaR, then the average loss for the hedge fund is around 103 million of US dollars of their invested capital.

##     5% 
## -72.26
## [1] -103.47

VaR backtesting

Finally, we test how effective the forecast of VaR is. That is: if the returns of the oil prices are lower than the VaR only 5% of the time. We can use the ugarchroll function and data from 2014 until 2021 to simulate a year ahead of returns until 2022.

## [1] 1.2

If the VaR was calculated correctly, then we should expect not more than 5% of the returns being lower than the VaR. In our case, only 1.2 % of the returns breached the forecast VaR.Thus, our model is an accurate measure of risk for most days of trading at the 95% confidence interval

Conclusion

We analyze the data for the West Texas Intermediate (WTI) crude oil prices to find a statistical model of it. The purpose of such research is to predict the VaR and ES based on the future price of oil. The analysis led us to conclude that a GARCH(1,1) with rescaled T distribution errors was a good candidate to simulate our data. Once the model has been constructed, we calculate VaR and ES for a day ahead, which was accurate more than 95% of the time.

Bibliography and Notes

[1] Federal Reserve of Saint Louis is the source of the data used in this Analysis

[2] Conditional Value at Risk (cVaR), average value at Risk(AVaR), and expected tail loss are equivalent concepts to Expected shortfall.

[3] OSBORNE, M. F. M. “Brownian Motion in the Stock Market.” Operations Research, Vol. 7, No. 2 (March-April 1959), pp. 145-173. Also in P. H. Cootner, The Random Character of Stock Market Prices, Cambridge, M.I.T. Press, 1967, pp. 100-128.

[4] Ngai, Hang Chan. Times Series: Applications to Finance. A JOHN WILEY & SONS, INC., PUBLICATION. New York, 2020. Pp 101-116

[5] GARCH stands for generalized autoregressive conditional heteroskedasticity. The GARCH model is a generalization of the ARCH model proposed by Robert F Engle ,

[6] Shumway, Robert H & Stoffer David S. Times Series Analysis and Its Applications with R examples. Springer. London 2011.Pp 280-289